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Abstract 

A covering array CA(N ; t, k, v ) is an N x k array A whose each cell takes a value 
for a v-set V called an alphabet. Moreover, the set V is contained in the set of rows 
of every N x t subarray of A. The parameter N is called the size of an array and 
CAN(t,k,v ) denotes the smallest N for which a CA(N;t,k,v) exists. It is well known 
that CAN(t,k,v) = @(log 2 h) (§]. In this paper we derive two upper bounds on d(t,v) = 
limsup^go using the algorithmic approach to the Lovasz local lemma also 

known as entropy compression. 


1 Introduction 


A covering array CA(N ; t, k, v ) is an N x k array A whose cells take values from a set V of size 
v and the set of rows of every N x t subarray of A contains the whole set V 1 . The parameter t 
is called the strength, the parameter v is the alphabet size and N is called the size of the array. 
A covering array with given parameters t, k and v always exists. The two central questions 
regarding covering arrays are: what the smallest number of rows is, denoted by CAN(t,k,v), 
for which a covering array with the given set of parameters (t, k,v) exists, and how an array of 
such size can be constructed. In this paper we study the upper bounds on the asymptotic size 
of covering arrays. It is easy to see that if t — 1 or v — 1, covering arrays are trivial. Hence we 
assume that t > 2 and v > 2. 


Covering arrays are best known for their applications in the software testing industry 13,15 


as interaction testing plans. There are numerous software tools for construction of covering 
arrays jH], and there is a vast literature on them as well 2,10,13,141. However, the central 
question about the optimal size is far from fully answered. The only infinite family of covering 
arrays whose exact size is known is the first non-trivial family of arrays of strength t — 2 and 
with alphabet size v = 2 mm- The best known upper bound on the size of a covering array 
for any set of parameters (i, k,v) is obtained by an application of the Lovasz local lemma |7|. 
Together these two results give us the asymptotic size of covering arrays when strength t and 
alphabet size v are fixed and the number of columns k is varied. 


*Research supported by NSERC grant 249777. 
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Theorem 1. [8 


11 


12 


Let t, v > 2 be integers. Then, 

C AN [t,k,v) = @(log 2 k). 


Given the previous theorem, there is significant interest in determining the following two 
values (we use the notation given in [141): 


c(t, v ) = lim inffc^ CA ££f' v) and d(t, v ) = lim sup^^ CA £} t n £ v) 


log 2 k 


The exact value of d(t, v ) is only known when t — 2. 

Theorem 2. |7| Let v > 2 be an integer. Then d{2,v) = |. 

However, covering arrays which meet this asymptotic size are hard to construct. The only 
family which we currently know how to construct which attains this size is the already mentioned 
family of CAs with t = 2 and v — 2 


11 
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In 1996, Godbole et. al. [8] gave an upper bound on d(t,v ) for any strength t >2. 
Theorem 3. |8| Let t >2 and v be positive integers. Then, 


d(t, v ) < 


(*~ 1 ) 

log 2 ^r 


Recently, the method of entropy compression was successfully used in the context of vertex- 
colourings of graphs |6,[9] to improve on the previous results which used the local lemma. In 
this paper we explore an application of this method in the context of covering arrays. We 
give a new upper bound on d(t,v ) for any t > 3 in Theorem 13, which improves Theorem [3] 
We also obtain a tighter upper bound on d(t,v) given in Lemma 14 which depends on further 
computational approximations. Table [I] displays our new upper bounds on d(t,v ) for 2 < t < 6 
and 2 < v < 10. Finally, we analyze these results and point out possible challenges and further 
avenues for improvement. 


2 Algorithm 

We adapt the algorithms given in |6i, 9] to covering arrays. The algorithm is used as a tool for 
counting. The main idea is to keep a record of execution for the algorithm. This allows us to 
match an input sequence to the algorithm injectively with a pair consisting of the output array 
and the record of the execution. For a given input, we say that the execution was unsuccessful 
and that it produced a bad output if, the output array is only partially filled and has some empty 
columns. If the total number of possible input sequences is greater than the total number of 
bad output pairs, then there must exist an input sequence for which the algorithm successfully 
terminates. Before we give the algorithm, we need to introduce some notation which is required 
for the analysis. 

Given array parameters N,t, k and v, the algorithm attempts to construct a covering array 
of size N x k one column at a time. A column of a CA(N ; t, k , v) is an element of V N , where 
V is the alphabet set of size v. To remind us that these ordered IV-tuples are columns, we 
denote elements of V N by c. Let JA C V N denote a set of all admissible input columns for 
the algorithm. We will define JA in Section [5] Then the algorithm receives as an input value 
I G a sequence of l columns, where l is the number of iterations to be performed. Let I(j) 
denote the j th coordinate of I. 

At an intermediate step in the algorithm, some columns of the array may still be empty. 
Let 0 denote an empty column, and let JA' = JA U {0}. Then, an array can be represented as 
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a sequence of k elements of y, i.e. A = (ci, C 2 ,..,, Ck), where c t e y. Let A{i) = c t be the 
values in the i th column of the array. We also require a way to choose which column to fill at 
each step. Define <^{A) to be the priority function on the empty columns of A: 

, .. J —1, if there is no empty column in A, 

r ' ) min{i : A(i) = 0}, otherwise. 

The key property of a covering array is that the subarray on any t columns contains each 
t-tuple in V 1 at least once. Let 2F be the set of all t-subsets of the set [1, k\ = {1,2,..., k}. 
Let r — {*i, * 2 ,.. ■, it} G 2T, where i\ < ii < ■ ■ ■ < it- Then denote by A\ T = (c^, Cj 2 ,... qJ 
the subarray of A on columns indexed by r. An auxiliary function is a covering(A| T ) returns 
true if the set of rows of A\ T contains V 1 as a subset, and false otherwise. 

Now we are ready to describe the algorithm which attempts to construct a CA(N]t,k,v ) 
for some positive integers N, t, k and v. It starts by initializing all columns of an N x k array 
A to be empty, and opens a new record hie R. Then it runs for l iterations where t is the 
length of the input sequence. The partially constructed array A satisfies the covering property 
at the beginning of each iteration. At a step j, let i be smallest index of an empty column of A. 
The algorithm assigns to the i th column the j th element of the input sequence. Now, if A has 
an N x t subarray on columns r C [1 ,k], which is not a covering, then i € r since A met the 
covering property before algorithm entered the j th iteration. The algorithm records t = r\{i} 
and the content of the subarray of A on columns in r. Note, in order to be able to recover input 
from the output, we need to know the relative position of i with respect to other elements in 
r since i is not recorded. Hence the elements of r are first sorted in increasing order. Finally, 
since this subarray does not have the covering property, we assign empty values to the columns 
in t. Otherwise, the addition of a new column to A preserves the covering property and the 
algorithm completes this iteration after recording a successful entry to the hie. 

Note that the number of lines in the record hie R is equal to the number of executed 
iterations. If the algorithm completes and the array A has no empty columns, then it is easy to 
see that A satishes the covering property on every set of f-columns, i.e. it is a covering array. 
Otherwise, A is only partially constructed, it has some empty columns, and we say that the 
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execution of the algorithm on the given input was unsuccessful. 

Data: I G where J? C V N 

Result: a (partial) covering array CA(N\t,k,v ) 

A := (0,0, 

R : = new hle() 
for j 1 to £ do 

i := (p(A) 

if i == —1 then 
break 
end 
else 

A(i) ■= I(j) 

good := true 
for all r G 2A do 

if % G r and A\ T has no empty columns then 
good := is a covering(A| T ) 
if good == false then 
// omit i from r 

sort(r) // so that t(ti) < r(r 2 ) when rq < r 2 

(4,4, ■ ■ ■ ,it) -=r 

h := index of i in r // i.e. in — i 

r := (ii, i 2 , ■ ■ ■, 4_i, 4+i,..., 4) // i-e. r = r \ {i} 

// record the content of kL| r and delete these columns 

(ci, c 2 ,..., ct) ■= (-4(4), -4(4), • • •, -4(4)) 

-4|t := (0,0,...,0) 

R.write(‘back-track - in columns:’, f, ‘ deleted content: ’, 

(ci,c 2 ,...,c t ), ‘\n’ ) 

break // break the loop over r G 2? 

end 

end 

end 

if good == trne then 
j i?.write(‘successful entry \n’) 

end 

end 

end 

return (4, R) 

Algorithm 1: Entropy compression algorithm for construction of a CA(N\t,k,v ) 


3 Reversibility 

Next, we establish bijection between the set of all possible inputs and the set of all possible 
outputs = {(A, i?) : obtained by the algorithm on an input / G J* 1 }. It is easy to see 
that for an input sequence / G we get only one output (A,R). We prove the converse in 
several steps. Let Aj denote the state of the array A at the beginning of the j th iteration of 
Algorithm [l} Hence, A\ is an empty array, and A^ + \ = A, the array returned by the algorithm. 

Lemma 4. Given ( A , R) G we can determine the set of indices of all columns which are 
empty in Aj for all j G {1, 2,... ,£}. 
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Proof. We use induction on j. Denote by Sj the set of all indices of columns which are empty 
in Aj. When j — 1, <§\ — [1, k], since the algorithm starts with an empty array A\. 

Assume that we know S’j for some j < £. Then, i = min S’j = tp(Aj) is the index of a column 
which receives a value in the j th iteration. If the j th line of R starts with ‘successful entry’, 
then Sj +1 = S’j \ {z}. Otherwise, the j th line of R contains f = r \ {z}, where r is the set of 
columns whose content is removed at step j. Hence, <fj +1 = S’j U f. □ 

The following is an immediate corollary. 

Corollary 5. We can determine <p(Aj) for all j G [1, £] from an output of the algorithm 
(A, R.) G G t . 

Next, we determine Aj at each step of the algorithm from the output values. 

Lemma 6. Given (A, R) G Gg, we can deduce Aj for all j G [l,f + 1]. 

Proof. The proof is by reverse induction. When j = £+1, Aj = A, the output of the algorithm. 
Assume that we know A J+1 for some j < £. By Corollary [5j we know z = tp(Aj). We have 
two cases to consider. If the j th line of R starts with ‘successful entry’, then Aj is obtained 
by deleting the content of column i in A ]+ \. Otherwise, the j th line of R contains f, indices 
of all but one of the columns whose content is deleted at step j of the algorithm. It also has 
the content of all t of these columns, (cj i; Cj 2 ,..., Cj t ) G V*, where t U {z} = {A, z 2 ,..., it} such 
that i ri < i r2 when T\ < r 2 . Then Aj is obtained from Aj +l after the following assignment: 
A j+1 (i r ) = c ir for all r G [1, t]. □ 

Finally, we are ready to prove the reversibility: given an output, we can obtain the unique 
input sequence for the algorithm. 

Lemma 7. Given (A, R) G Gi, there is a unique input sequence I G J !£ , such that Algorithm [i] 
produces (A, R) on input I. 

Proof. The proof is by induction on £. If £ = 1, then / = A(l). Assume that the statement is 
true for some £ > 1. Let (A, R) £ G e+ 1 and denote by / the desired input sequence. Let R' be 
the record R without the last line. By Lemma[6j we know the value of A# +1 and (A£ +1 , R') G Gf . 
By our assumption, there is a unique input sequence I' G J^ £ such that the algorithm gives 
(A t+1 ,R!) on input I’. Then I(j) = I'(j) for j G [1,^]. It remains to determine I(£ + 1). 

If the last line of R is ‘successful entry’, then it must be that I(£ + 1) = A{ip(At+ 1 )), where 
(p(Ai + 1 ) is given by Corollary [5j Otherwise, the last line of R contains f and (ci, c 2 ,..., c t ). As 
before, let t U {(/?(A^ +1 )} = {z!,z 2 ,... ,z t }, where z ri < z r2 when ri < r 2 . Let h be such that 
c p(Az + i ) = i h . Then we have that I(£ + 1) = Ch which is uniquely determined. □ 


4 Algorithm analysis 

In Section [ 3 ] we established a bijection between the total number of inputs and outputs Gi 
of Algorithm [lj Next, we want to show that when a given set of covering array parameters 
satisfies certain conditions and £ is big enough, the total number of inputs to the algorithm is 
greater than the set of outputs which have exactly £ lines in the record hie (which correspond 
to unsuccessful executions). Hence, the algorithm will successfully terminate and output a 
covering array with desired parameters for some input sequence. 

We start by finding an upper bound on the size of the set of all possible record hies 
R with £ lines which can be output from Algorithm [l] Let £q be the number of ‘successful 
entry’ lines, and £\ be the number of ‘back-track’ lines. Then £ = £$ + £\ and these lines can be 
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positioned in the record file in Qj = ( loe J ways. Denote by C± the number of distinct pairs 
(f, (ci, C2,..., Ct) ) which can appear in a ‘back-track’ line. Then 

m= £ 

(Wi) 

Now, we can apply the following result from |9|. 

Theorem 8. |9i, Corollary 19.] Let £ and p be positive integers. Let Si E Z + and Ci > 1, 
Ci E M, for i E [1,p]. Define Bg to be 



where C is a non-negative integer, i E [0,p], = ^ an< ^ ^ — T7i=i Then 

V B e (£ 0 ,£ 1 ,...,£ p )<£(£+iy( inf Q(x)) , 

ho A,-,ip) v 7 

where 

Q(x) = i + . 

Corollary 9. Let C\ be the number of distinct pairs (f, (ci, C 2 ,..., c t )) which can be recorded 
in a 'back-track ’ line in an execution of Algorithm^ 1} Then, 

m < I(e +1) -1) 1 , c l A . 

Proof. We apply Theorem |8] with p = 1 and we only need to determine the value of sj. Note 
that the algorithm cannot back-track unless there are t non-empty columns in A. Since the 
total number of added columns in A is £, one at each iteration, and the total number of deleted 
columns is t£ i, we have that £ > t£\. Thus, let Si = t. Now taking the first derivative of Qfx ) 
to get the minimum, the result follows. □ 



Finally, we give a lemma which is going to be our main tool in further analysis. 

Lemma 10. Given positive integers N, t, k and v, and a set JA C V N , where \V\ = v, there 
exists a CA(N-,t,k,v ) whose columns are elements of the set if 

(t - l)Ci < \s\\ 


where Ci the number of distinct pairs (f, (ci, c 2 ,..., c t )) which can be recorded in a ‘back-track ’ 
line in an execution of Algorithm^ 

Proof. Denote by (3 1 = {(A,i?) : R E C the subset of all possible outputs of the 
algorithm which have exactly £ lines in the record hie R. Since an output array A has k columns 
each of which is either empty or in J?, we have that 


e,\ < (\s\ + 1 )‘|«,| 


=°((^ ( ‘- i ) ic 0 < ) 



(by Corollary [9]) 


(as a function of £). 
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Note that C\ = Ci(N,t,k,v), and hence it is a constant with respect to £. Therefore, by the 
assumption of the lemma, for sufficiently large £, we have that \&i\ < \^\ l , and | J^| f equals the 
total number of possible inputs of length £ for the algorithm. Since, \(?t\ = \TF\ 1 by Lemma[7j 
there exists an input on which the algorithm terminates in less than l iterations and hence 
outputs a CA(N] t, k, v). □ 


In the following section, we apply Lemma [TO] to derive an upper bound on asymptotic size 
of covering arrays. 


5 Balanced covering arrays of any strength t 


To demonstrate how Lemma [TO] can be applied, we start with an easy example for a construction 
of a covering array of arbitrary strength. The main difficulty in the application of Lemma 10 
is to give a good upper bound on the value of C\. In Section [6j we will strengthen the general 
result in the cases when t = 2 and t = 3. 

Recall that C\ equals the number of distinct pairs (t, (ci, C 2 ,..., c t )) which may appear 
in a ‘back-track’ line in the record hie of Algorithm [I] for a parameter set ( N]t,k,v ). The 
‘back-track’ line is recorded only when the array (ci, C 2 ,..., c t ) is not a proper cover. Hence, 


Ci = 


k 

t - 1 


• Kl, 


where is the set of all N x t arrays on the alphabet set V of size v, such that for every array 
in s4 t there is at least one element of V t which is not contained in the set of rows of the array. 

Taking the input set J? = V N to the equal to the set of all possible IV-tuples on alphabet 
V, we can easily obtain the upper bound on the size of a covering array using Lemma [TTT| which 
is almost identical to the one derived using Lovasz local lemma [8j. This bound is improved if 
instead we take J? to be the set of balanced columns: V-tuples in which every alphabet symbol 
appears equal number of times. Hence, from now on, we will assume that N = mv for some m, 
and is the set of balanced columns. Therefore, \J^\ = ( m ™' v m ). A balanced covering array, 
is a covering array whose columns are elements of J?. 

We also require some approximations of the binomial coefficient which we use in the subse¬ 
quent sections. 


Lemma 11. 16, Theorems 2.6. and 2.8.] Let m,v G Z, v > 2 and m > 2. Then 

'mv' 


l{y) m 


-1/2 


v 


(V - ljh-W 


< 


v 


< u(v) m 1/2 - ( . , 

ml (v - l)M m 


where 

i( v ) = ( 2 v i ) (t '~ 1) and u ( v ) 

Lemma 12. |4] For any positive integer n > m 


1 / V \V 2 

\/2n v v —1 / 


( M < 2 nh ^’ 

\m J 


where h(x) = —x log 2 (a;) — (1 — x) log 2 (l — x) for 0 < x < 1. 


We will apply Lemma 11 for parameters v and m, where v denotes the alphabet size and 
m denotes the number of occurrences of each symbol within a column. Recall that covering 
arrays are trivial when either v — 1 or t — 1. Also, for any covering array, an obvious lower 
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bound is N = mv > d*, so m > v 1 1 > v for all t > 2. Hence, the conditions of Lemma 11 
always hold for non-trivial parameter sets. 


Onr first application of Lemma [TO] is for the most general case when the strength of a 
covering array is any positive integer t > 2. 

Theorem 13. Let t and v be positive integers, t,v > 2. Then 

v(t~ 1) 


d(t, v ) < 


log 2 (^t) ' 


Proof. Let V be the alphabet set. Let k > t and m be positive integers and J' C V mv be the 
set of balanced columns. Since C\ = ( |^|, if m is such that 


(t -1) ■ k<- 1 ■ \m\ 

\s\‘ 


< i, 


(5.1) 


then by Lemma 10, there exists a balanced CA{mv\t , k,v). 

Now, 

|M| < U* ' \y\ ■ (V^ 1 - 1 )"* • (t;*- 1 )"*’- 1 ). 

Indeed, if A G then the following properties hold. 

• There are v f choices for an element (ai, a 2 ,..., at) € V 1 which is not covered by rows of 
A. 

• The first column of A can be any element of the input set J?. 

• The m rows of A having a\ in the first column cannot contain the ordered (t — l)-tuple 
(a 2 , a 3 ,..., a t ) in the remaining cells. 

• All other rows of the array obtained from A by removing the first column can contain 
any element of V l ~ l . 


By Lemma 11 


\*\ = 


mv 


mv\ f m{y — 1) 
m ) \ rn 


2m 

m 


> 


h'W 


m 2 v 


hence 


,i=2 


(la) it — 1) ■ k l 1 ■ \s^ t \ t _ x iv-m-i) (y* 1 - 1 

- <M(v,t)k m 2 1 


,,t- i 


(5.2) 


1 -t 


\S\‘ 

where M(v,t) = (^)‘ (t - l)v‘ (n', 2 'W) 

For fixed covering array parameters ( t,k,v ), the right hand size of inequality (5.2) is a 
function of m and its dominant term is exponential with base smaller than 1. Let m be the 


smallest positive integer for which the right hand side of inequality (5.2) is smaller than 1. 


Then inequality (5.1) is satisfied, and so there exists a balanced CA(mv, t, k , v). Since m is the 


smallest such integer, it follows that inequality (5.1) does not hold for m — 1, that is 

(m— 1) 


M(v, t ) 1 (m — 1) 2 

Taking the logarithm of both sides, we get 

m 


( „_i )( t-i) / v 


o-l 


it-1 


> 1. 


< 


lim sup _ t _i , 

log 2 k log 2 (^AT3l) 

Note that linp.^ oc = 0 by Theorem |l[ Finally, since CAN(t, k,v) is at most the size of a 


balanced CA(t,k,v), we get an upper bound on d(t,v). 


□ 






















6 Tighter bound on d(t , v 


The main difficulty in computing the value of C\ is counting the N xt arrays over an alphabet 
set V which are not covering arrays. We can obtain a multivariable function in t — 2 variables 
to approximate C\ from above. When t — 2 and 3, we get exact bounds, and for higher values 
of t we obtain these bounds using mathematical software for non-linear optimization. 

For the purposes of the following lemma, let ft )V be the following function on domain (0, l) t_1 : 


ft,v( x i> x 2 , • • -,xt-i) = log 


(v — x t - 1 )^ Xt ~^ 


+ 


(v — 1 — 

( (v - XiY v ~ xi) 

2^ io g; 


2=1 


Xi 


+ X i+ 1 )(«- 1 -*i+®<+i)(a; i - X i+1 ) { - xi ~ x ^\ 1 - X i+ i)(!-*i+i) 


Lemma 14. Let t > 2 and v be positive integers and 


fo(t,v) = max f t}V (x 1 ,x 2 ,... ,x t -i)- 


T/ien d(2, 2) = 1 and when tv > 4, 

d(t, n) < 


(t - l)n 


(t - 1 ) (log 2 ) - fo(t, v ) 


Proof. As before, let V be the alphabet set of size v. Let k be an integer, k > t. We need 
to bound the size of &4 t . A set of rows of A 6 does not contain a f-tuple in V t , which 
we denote by (ai, a 2 ,..., a t ) € V*. Next we count the number of occurrences of the 1-tuple 
(ai) in the first column of A, the number of occurrences of the 2-tuple a 2 ) in the first two 
columns of A, and so on. Let 0 < Xi < 1 be such that the subarray of A restricted to columns 
1 through i contains exactly mx{ rows (a±,a 2 ,..., a*), where i G [l,f]. We know that X\ — 1 
and x t — 0 since the columns of A are balanced and does not cover (a 1 ,a 2 ,..., a*). Also, note 
that Xi > x i+ i for all i. 

The first column of A can be chosen arbitrarily. Any other column i > 2, contains mXi 
cells with value ai within mx ^i rows which contain (ai,a 2 ,, «*_i) in the previously chosen 
columns. Hence, the i th column of A can be completed in at most ( mXi ~A 1 )\ /" m(v- 1 ) \ 

ways. 


If (t,v) 7 ^ (2, 2), using Lemmas 11 and 12 , we get 

(A) t (*-l)-* t-1 'l'*l 


\*\' 


< 


< 


< 


{t — 1 ) • k 


t -1 


n t-l /mxi-i \ \ / m(v—xt—i)\ 

2=2 V mxi ) V m(l—Xi) ) I \ m ) 


A -1 


t -1 


(t — 1 ) • k 


t -1 


(mv\ 1 
\ m ) 


it^lY — 1 1 ^ 2 


l(vy-hn-d-i)/2 ^ (i; _ 1) „_ 1 
/ , „ , \ 


m(t— 1 ) 


l(v] 


t -1 


• k 


t -1 


2/odv) 


\ ^-w- 1 


(t-i) 


( 6 . 1 ) 


/ 


where 


t-i 




.i=2 




Zj-1 


+ (n - 


1 ~ Xi 

v - Xi -1 


+ (n -x t _i)L 


v - X t _i 
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In the last inequality, the dominant term is an exponential function of m. Following the same 


reasoning as in the proof of Theorem 13, we get an upper bound on d(t,v). 


Using the definition of the entropy function h, one can write ft, v in the form given above. 
Also note that X\ — 1, so it is a dummy variable for / t 

If (t,v) = (2,2), since X\ = 1 and x 2 = 0, there is (™) (™) (™) = 1 choice for the second 
column and hence \srf t \ = \J?\. Note that this is the only case for which we get the exact count 


of the number of N x t arrays which are not coverings of strength t. Using Lemma LI 


(Ur) c*- 1 )- fc 


t -1 




< 


4 k 


< 


4m 1 / 2 ■ k 


( 2 D z ( 2 ) • 22ri 


As before, taking the smallest m for which the right hand-side of the last inequality is smaller 
than m,it follows that d(2 ,2) < 1, which is the exact value of d{2 ,2) (TTJ[l2 . O 


Observe that f 2)V is a constant function since x\ = 1, and fz, v is a single variable function 
so we can easily obtain its maximum taking the first derivative of yV . The same result can be 
obtained using Lovasz local lemma directly [Ti 


Corollary 15. Let v be a positive integer, v > 2. Then d( 2,2) = 1 and 

v 


d(2,v) < 


v v (v-2) v 


lo §2 \ („_ 1 ) 2 („- 1 ) 

Corollary 16. Let v > 2 be an integer. Then 
d(3,v) < 


, when v > 3. 


2v 


2v 


log 2 


where C, = |(1 + v — Vv 2 + 2v — 3). 


Proof. The function 

h,v{ 1,^2) = log 2 


{v — x 2 Y v x ^ 


(v — 1 — X 2 y v ~ 1 ~ X2 ' ) X 2 


+ log 2 


1)(«-!) 


(v - 2 + x 2 )0- 2+3;2 )(1 - x 2 ) 2 ( 1 ~ X2 '> 


is maximum 


at £ = |(1 + v — \Zu 2 T 2v — 3) < 1. It is straightforward to apply Lemma 
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□ 


For t > 4, f t:V is a multivariable function. We used a successive quadratic programming 
solver in Octave to compute f 0 (t,v). Table [I] gives values of d(t,v ) obtained in Corollar¬ 
ies 


15 and 16 and by computational optimization for 4 < t < 6. 


7 Analysis of results 


Theorem [13] provides a new upper bound on d{t , v) for any t. This bound is an improvement 
on the current best general upper bound on d(t,v ) derived in |8|. To see this, recall that 

ln i 1 +s) = I - h + °(£) = s+t + °(^) ~ ; 


for |a;| 1. Hence, for a fixed value of t, 


1 )(«*-Lln(2) 
lo & 2 


and 


(t ~ 

log 2 (^i) 


(t~l) U-- ln(2). 
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V \t 

2 

3 

4 

5 

6 

2 

1 

7.56 

27.32 

79.74 

209.13 

3 

3.97 

32.03 

158.65 

658.21 

2503.83 

4 

8.16 

81.35 

518.55 

2816.81 

14162.67 

5 

13.72 

163.91 

1281.78 

8635.15 

54108.77 

6 

20.65 

288.03 

2672.98 

21523.56 

161643.64 

7 

28.98 

462.05 

4966.64 

46555.89 

407676.24 

8 

38.68 

694.28 

8487.15 

90802.26 

908447.35 

9 

49.78 

993.05 

13608.84 

163661.74 

1841749.21 

10 

62.25 

1366.68 

20755.89 

277195.09 

3465640.41 


Table 1: Upper bounds on d(t,v). 


The better upper bound on \&/ t \ obtained in Section [6] yields the most improvement when 
t = 2 since over-counting is the least in this case. As above, we can easily approximate the 
bound obtained in Corollary [15] to get 


v ln(2) 


v(v 




log; 


(v v (v—2) v - 2 \ V ] n (v)_( v _ 2)hl (v = l\ 

^ („_ 1) 2 ( - u -i) j \v-ij y > \v~2> y > 


ln(2) < v(v — 1) ln(2). 


Hence, we get a tighter bound on d(2,v). However, note that the upper bound on d(2,v) 


for v > 3 given in Corollary 15 still quadratic in v, which is the same as the bound given in 
Theorem 13 Recall, d(2,v) = | |7|. Hence, even for the strength is t — 2, the obtained upper 
bound on cq2,v) is far from optimal. However, Algorithm [I] provides one major improvement 
to previous asymptotic constructions: when t — 2 and v = 2 we are able to compute the exact 
size of which gives us that d( 2,2) = 1 in Corollary [l5| This indicates that Algorithm [l] 
might potentially yield asymptotically optimal covering arrays. But the current approximation 
the size of &/ t , the set of N x t arrays with balanced columns which are not coverings, intro¬ 
duces substantial overcounting even in the easiest case when strength t — 2. To see this in a 
different way, consider the examples of upper bounds on d(2,v) given in Table [2] We can see 
the improvements on the upper bounds on d( 2, v) obtained in Theorem 13 and Corollary 15 


compared to Theorem [3] The fourth row of Table [2] corresponds to a bound obtained by the 
following simple construction. Let Y be a collection of all 2-subsets of an alphabet set V of 
size v. Then a CA(2,k,v ) on alphabet set V can be constructed by juxtaposing Q) isomor¬ 
phic copies of a CA(2,k,2) on alphabet set V' for every V' G V. Since d( 2,2) = 1, we get 
d{2,v) < AlLdl < v ( v _ xj ln(2), giving improvement to the general bound on d{2,v) obtained 
by Corollary 15 More advanced direct constructions of covering arrays of strength t — 2, 


especially when v is a prime power, provide covering arrays which yield even smaller bounds on 
d{ 2, v) which are still quadratic in v (for example, see HD- The fifth row of Table [2] gives the 
slope of least square regression line for the set of pairs (log 2 k, N ) such that N is the smallest 
size for which a CA(N ; 2, k,v) is currently known (as given in tables in [I]). We can see that 
these values are still far away from the optimal asymptotic size given in the last row of Table [2] 
with the exception of v = 2. 


We have seen that Theorem 13 provides an improvement on the upper bound for d(t,v ) 


compared to the current best known result stated in Theorem [3] for any value of t. However, 
the improvement obtained is comparatively small as t increases (for example, see Table [3]). 
On the other hand, the upper bounds obtained here predict the existence of covering arrays 
with smaller size that what is currently known. Indeed, Algorithm [l] terminates and outputs a 
proper covering array when (5.1) is satisfied. That means that for a given t, k and v, if m is 


such that the value in (6.1) is smaller than 1, a CA(vm ; t, k, v ) exits. Figure [I] plots the current 
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d(2,v) \v 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Theorem 

li 

2.41 

5.89 

10.74 

16.98 

24.61 

33.62 

44.01 

55.80 

68.97 

Theorem 


2.0 

5.13 

9.64 

15.53 

22.81 

31.48 

41.53 

52.96 

65.79 

Corollary 

3 

1 

3.97 

8.16 

13.72 

20.65 

28.98 

38.68 

49.78 

62.25 

0 


1 

3 

6 

10 

15 

21 

28 

36 

45 


slope of regression 

1.02 2.84 

5.15 7.935 11.83 

15.49 

19.55 

21.99 25.83 

Theorem |2j 

1 1.5 

2 2.5 3 

3.5 

4 

4.5 5 


Table 2: Comparison of upper bounds on d(2,v). 


best known sizes of covering arrays with t = 6, and v = 2 or v = 7 given in [I] against the 
sizes of covering arrays for which (6.1) is smaller than 1. We can see that for small values of 
k, the current, predominately computational results, are producing covering arrays of smaller 
size. However, for large values of k we are predicting the existence of covering arrays with much 
smaller number of rows. 


d(6,v) \v 

2 

3 

4 

5 

6 

7 

Theorem 

3 


220.07 

2524.79 

14193.92 

54150.39 

161695.64 

407738.63 

Theorem 

13 


218.32 

2521.32 

14188.72 

54143.46 

161686.98 

407728.23 

Table 1 



209.13 

2503.83 

14162.67 

54108.77 

161643.64 

407676.24 


Table 3: Comparison of upper bounds on d(6,v). 




(a) CAs with t = 6 and v = 2. (b) CAs with t = 6 and v = 7. 

Figure 1: Comparison of sizes of covering arrays which can be constructed by Algorithm [l] with 
currently best known sizes. 
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8 Conclusion 


Determining the optimal size of a covering array for a given triple (t, k , v) and constructing 
optimal covering arrays have been two central questions in this area of research. The interest in 
these two questions stems from the fact that covering arrays are natural models for interaction 
test suites and hence they are extensively used in the blooming software testing industry. 
However, these two questions have proven to be a great challenge for both combinatorial and 
computer science research communities. 

In this paper we tackled the problem of determining the upper bounds on the asymptotic 
size of covering arrays using an algorithmic version of the local lemma. We determined a new 
general bound on d(t,v ) (see Theorem 13) and we gave a tighter bound in Lemma 14 which 
depends on further numerical computation. 

However, though we are improving the existing upper bounds on the asymptotic size of 
covering arrays for strength t > 3, in the simplest case when t — 2 (and the over-counting 
is the least), the bounds we are obtaining are far from the optimal predicted by Theorem [2j 
The main challenge in improving these bounds is finding a better way to count the number 
of balanced arrays on t columns which are not f-coverings. A new view to this problem may 
lead to better encrypting of information in the ‘back-track’ lines in the algorithm. Indeed, in 
the case when t — 2 and v = 2, we are able to count these arrays exactly and as a result this 
general algorithm produces covering arrays whose size is asymptotically optimal. 
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